Stock Pitch and Trading Simulation

Final Project

Name: Kehang Zhou (kz653@nyu.edu)

Date: May 8, 2017

Introduction:

Stock Pitch has long been a popular topic among investors, either speculative or long-term. People always want to invest in a stock which could bring them gigantic and stable wealth. Fundamental analysis and technical analysis are the two main analysis branch in the equity market. Technical analysis approaches security from the chart and historical performance, while fundamental analysis approaches security from company's financial statements statistics.

This project is totally an exploration which aims to explore both of these two analysis and apply them in pitching stocks. I tried to explore the relationships between returns of stocks and statistics of stocks I got(both statistics of company and stock performance). I also approach the analysis from multiple dimensions besides statistics(sectors, trading volumes). In the final part, I tried to do 2 trading simulations(Back test) on specific strategies to see if we can turn our analysis to profitable trading ideas.

Outline:

Process of Deriving Data and Simple Data Cleaning (skippable and Unstable)
Data Analysis on Statistics
Trading Simulation and Analysis

Note that the first part is skippable. Also please don't run the cells simultaneously. The main parts of this notebook is based on the customized dataset I derived on April 15th. Please load the csv file in the Data Analysis part.



In [1]:

    
import requests
import sys                             # system module 
import pandas as pd                    # data package
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for pandas 
       
import seaborn as sns                  # seaborn graphic tools

from plotly.offline import iplot, iplot_mpl  # plotly plotting functions
import plotly.graph_objs as go               
import plotly                                

import statsmodels.formula.api as sm    #linear regression package
import statsmodels.api as sm
from pandas.stats.api import ols
%matplotlib inline             
plotly.offline.init_notebook_mode(connected=True)

Main Dataset

I found the list of S&P 500 companies dataset in the website . The file constituents-financial.csv contains symbol of the company, name of the company, sector of the company and a bunch of financial statistics about the companies such as dividend yield, price to books value and EBITDA. Those could be useful to me but I’m afraid that those are not the most up-to-date statistics. So I use another file constituents.csv which only contains symbols, names and sectors.



In [2]:

    
#read the data from github and use the second file
url = 'https://raw.githubusercontent.com/datasets/s-and-p-500-companies/master/data/constituents-financials.csv'
url2 = 'https://raw.githubusercontent.com/datasets/s-and-p-500-companies/master/data/constituents.csv'
sp = pd.read_csv(url2)



In [3]:

    
# The table looks like:
sp.head(8)









    Out[3]:






  
    
      
      Symbol
      Name
      Sector
    
  
  
    
      0
      MMM
      3M Company
      Industrials
    
    
      1
      ABT
      Abbott Laboratories
      Health Care
    
    
      2
      ABBV
      AbbVie
      Health Care
    
    
      3
      ACN
      Accenture plc
      Information Technology
    
    
      4
      ATVI
      Activision Blizzard
      Information Technology
    
    
      5
      AYI
      Acuity Brands Inc
      Industrials
    
    
      6
      ADBE
      Adobe Systems Inc
      Information Technology
    
    
      7
      AAP
      Advance Auto Parts
      Consumer Discretionary



In [4]:

    
sp.shape        #general shape of the dataset









    Out[4]:





(505, 3)



In [5]:

    
sp.dtypes       #datatypes of the dataset









    Out[5]:





Symbol    object
Name      object
Sector    object
dtype: object

Yahoo Finance Data Package

Here I Install Yahoo finance package in order to get access to latest stock data:

Please use pip install yahoo-finance in commander to intall the package

According to the python index instruction. I used multiple methods to get data by passing through the symbol of the stock.

Basically, this package could give you the latest statistics of companies and stocks in the Yahoo-finance summary page of a stock. Here is an example of the page



In [6]:

    
from yahoo_finance import Share                        #load the Share funciton



In [7]:

    
#Example: By putting an authorized symbol of stock in the share function, we could get the statsitics we want. 
#Below is demo to get the 50day_moving_average of BRK-B share.
Share('BRK-B').get_50day_moving_avg()









    Out[7]:





'166.89'



In [8]:

    
stocklist = list(sp['Symbol'].astype(str))    #get the symbol list with symbols as string
print(len(stocklist))
stocklist[0:5]                                # How the string









    



505






    Out[8]:





['MMM', 'ABT', 'ABBV', 'ACN', 'ATVI']

I discover that in our stocklist. Some symbol such as BRK.B contains a dot '.' . Yahoo Finance package only accepts symbol with '-' like BRK-B. So I cleaned the data by replacing '.' with '-'.



In [9]:

    
stocklist_a = []                                           # create a adjusted list of symbol
for stock in stocklist:
    slist = list(stock)                                    # use list function on a string
    slist2 = ['-' if x=='.' else x for x in slist]        # replace the '.' in the list with '-'  
    stock = "".join(slist2)                               # combine the list as a string
    stocklist_a.append(stock)                             # put the new symbol in the new list
sp['Symbol'] = stocklist_a                                #set the adjusted list as symbols in data



In [10]:

    
print(stocklist_a[0:10])                                  # how the adjusted list looks like
sp.head(7)









    



['MMM', 'ABT', 'ABBV', 'ACN', 'ATVI', 'AYI', 'ADBE', 'AAP', 'AES', 'AET']






    Out[10]:






  
    
      
      Symbol
      Name
      Sector
    
  
  
    
      0
      MMM
      3M Company
      Industrials
    
    
      1
      ABT
      Abbott Laboratories
      Health Care
    
    
      2
      ABBV
      AbbVie
      Health Care
    
    
      3
      ACN
      Accenture plc
      Information Technology
    
    
      4
      ATVI
      Activision Blizzard
      Information Technology
    
    
      5
      AYI
      Acuity Brands Inc
      Industrials
    
    
      6
      ADBE
      Adobe Systems Inc
      Information Technology

Then I used the for loop to get the specific statistics for each stock. Put all data in a list and then put the list in the dataframe

get_50day_moving_avg() - get the 50 days moving average of the stock price

get_200day_moving_avg() - get the 200 days moving average of the stock price

get_dividend_yield() - get the dividend yield of the stock

get_short_ratio() - get the short ratio of the stock

get_price_earnings_growth_ratio() -get the price to earning growth ratio(Peg) of the stock

I passed each one separately. Because every time I put the code together in one, it takes a super long time and it will have a error message. It's better if we run each one separately.

Warning: When I use the code to derive my dataset on April 15, 2016, everything was fine. However, when I execute the code in May, I always get error message saying HTTP error and the for loop interrupt. So if you run the cell below and get a error message, please don't executes the rest cells in the first part and just read the first part (Otherwise you'll get a lot error message). That's also why I clear the output of the dataset



In [11]:

    
list1 = []                                                                 # create an empty list to fill in the statistics.
for s in stocklist_a:                                                      # use the for loop to pass every symbol through
    a = Share(s).get_50day_moving_avg()
    list1.append(a)









    



---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-11-c3cb0ae75f24> in <module>()
      1 list1 = []                                                                 # create an empty list to fill in the statistics.
      2 for s in stocklist_a:                                                      # use the for loop to pass every symbol through
----> 3     a = Share(s).get_50day_moving_avg()
      4     list1.append(a)

D:\Program Files\Anaconda3\lib\site-packages\yahoo_finance\__init__.py in __init__(self, symbol)
    176         self._table = 'quotes'
    177         self._key = 'symbol'
--> 178         self.refresh()
    179 
    180     def _fetch(self):

D:\Program Files\Anaconda3\lib\site-packages\yahoo_finance\__init__.py in refresh(self)
    140 
    141         """
--> 142         self.data_set = self._fetch()
    143 
    144 

D:\Program Files\Anaconda3\lib\site-packages\yahoo_finance\__init__.py in _fetch(self)
    179 
    180     def _fetch(self):
--> 181         data = super(Share, self)._fetch()
    182         if data['LastTradeDate'] and data['LastTradeTime']:
    183             data[u'LastTradeDateTimeUTC'] = edt_to_utc('{0} {1}'.format(data['LastTradeDate'], data['LastTradeTime']))

D:\Program Files\Anaconda3\lib\site-packages\yahoo_finance\__init__.py in _fetch(self)
    132     def _fetch(self):
    133         query = self._prepare_query(table=self._table, key=self._key)
--> 134         data = self._request(query)
    135         return data
    136 

D:\Program Files\Anaconda3\lib\site-packages\yahoo_finance\__init__.py in _request(self, query)
    116 
    117     def _request(self, query):
--> 118         response = yql.YQLQuery().execute(query)
    119         try:
    120             _, results = response['query']['results'].popitem()

D:\Program Files\Anaconda3\lib\site-packages\yahoo_finance\yql.py in execute(self, yql, token)
     59             'q': yql,
     60             'format': 'json',
---> 61             'env': DATATABLES_URL
     62         }))
     63         return loads(req.read())

D:\Program Files\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    161     else:
    162         opener = _opener
--> 163     return opener.open(url, data, timeout)
    164 
    165 def install_opener(opener):

D:\Program Files\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
    470         for processor in self.process_response.get(protocol, []):
    471             meth = getattr(processor, meth_name)
--> 472             response = meth(req, response)
    473 
    474         return response

D:\Program Files\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
    580         if not (200 <= code < 300):
    581             response = self.parent.error(
--> 582                 'http', request, response, code, msg, hdrs)
    583 
    584         return response

D:\Program Files\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
    508         if http_err:
    509             args = (dict, 'default', 'http_error_default') + orig_args
--> 510             return self._call_chain(*args)
    511 
    512 # XXX probably also want an abstract factory that knows when it makes

D:\Program Files\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    442         for handler in handlers:
    443             func = getattr(handler, meth_name)
--> 444             result = func(*args)
    445             if result is not None:
    446                 return result

D:\Program Files\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
    588 class HTTPDefaultErrorHandler(BaseHandler):
    589     def http_error_default(self, req, fp, code, msg, hdrs):
--> 590         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    591 
    592 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 500: Server Error



In [12]:

    
list1[0:5]                                        # How the list would look like if no error









    Out[12]:





['192.36', '44.09', '65.30', '119.5790', '50.10']



In [ ]:

    
sp['50 Days Moving Average'] = list1              # create a new columns and put the list 
sp.head(5)



In [ ]:

    
#The following codes did the similar things.
list2 = []
for s1 in stocklist_a:
    list2.append(Share(s1).get_200day_moving_avg())



In [ ]:

    
list3 = []
for s2 in stocklist_a:
    list3.append(Share(s2).get_dividend_yield())



In [ ]:

    
list4 = []
for s3 in stocklist_a:
    list4.append(Share(s3).get_short_ratio())



In [130]:

    
list5 = []
for s4 in stocklist_a:
    list5.append(Share(s4).get_price_earnings_growth_ratio())



In [131]:

    
sp['200 Days Moving Average'] = list2
sp['Dividend Yield'] = list3
sp['Short Ratio'] = list4
sp['Peg Ratio'] = list5

Get the holding period return data from November 1 2016 to April 15 2017 by using the pandas data reader

The reasons why I pick the time range from November 1,2016 to Aril 15, 2017 are:

It is roughly 6 month, half year.
Some statistics of companies like oeg ratio doesn't change frequently but seasonally or semi-annually according to their report. So the statistics we get should have effects on the stock price during the period.
US stock market has experienced a huge rally after the US presidency election on the start of November 2016.



In [ ]:

    
# create two list for the stock price on the first day and the stock price on the last day
startlist=[]
closelist=[]
for stock in stocklist_a:
    start = datetime.datetime(2016, 11, 1)
    end = datetime.datetime(2017, 4, 15)
    f = web.DataReader(stock, 'yahoo', start, end)        
    flist = f['Close'].tolist()                      #put the close price in the list
    startlist.append(flist[0])                       # add the close price on the first day to the startlist
    closelist.append(flist[-1])                      # add the close price on the last day to the startlist



In [ ]:

    
sp['Stock price on 2016-11-1'] = startlist           # set new columns to put two lists in as data
sp['Stock price on 2017-4-15'] = closelist
sp['Holding Period Return'] = sp['Stock price on 2017-4-15']/sp['Stock price on 2016-11-1'] -1   
# Calculate Holding period return data in this period and put it in to the dataset as a new column
sp.head()

Since every time I passed on the code I would get latest data from Yahoo Finance. SO I save the data derived on April 15 to csv file and load it again



In [22]:

    
#sp.to_csv('sp500_April_15.csv')                                          # Save the data to our computer, please don't pass

Data Analysis on Statistics

Main parts start from Here

Please load the dataset sp500_April_15.csv from the computer, you could replace the code in the cell below to read the csv file from certain path, I put the csv in the same path of this jupyter notebook.



In [14]:

    
spd = pd.read_csv('sp500_April_15.csv',                #the data derived on April 15th.
                 usecols = range(1,12))                # The first column became number automatically and we don't want it



In [15]:

    
spd.head()                                             # general look of the data set









    Out[15]:






  
    
      
      Symbol
      Name
      Sector
      50 Days Moving Average
      200 Days Moving Average
      Dividend Yield
      Short Ratio
      Peg Ratio
      Stock price on 2016-11-1
      Stock price on 2017-4-15
      Holding Period Return
    
  
  
    
      0
      MMM
      3M Company
      Industrials
      190.17
      178.69
      2.49
      4.03
      2.46
      164.250000
      188.649994
      0.148554
    
    
      1
      ABT
      Abbott Laboratories
      Health Care
      44.76
      41.60
      2.48
      1.65
      1.59
      39.049999
      42.669998
      0.092702
    
    
      2
      ABBV
      AbbVie
      Health Care
      64.69
      62.20
      3.99
      4.90
      0.80
      56.450001
      64.129997
      0.136050
    
    
      3
      ACN
      Accenture plc
      Information Technology
      121.49
      118.87
      2.11
      4.22
      2.08
      116.139999
      114.860001
      -0.011021
    
    
      4
      ATVI
      Activision Blizzard
      Information Technology
      48.50
      42.49
      0.62
      2.58
      1.54
      42.599998
      48.419998
      0.136620

Stock pitch by sectors

Stock data in specific sector



In [16]:

    
# spdi is our main dataset, but I also created some other dataset.
spdi = spd.set_index(['Symbol','Name'])  # set the multi-level index
spdi.head()









    Out[16]:






  
    
      
      
      Sector
      50 Days Moving Average
      200 Days Moving Average
      Dividend Yield
      Short Ratio
      Peg Ratio
      Stock price on 2016-11-1
      Stock price on 2017-4-15
      Holding Period Return
    
    
      Symbol
      Name
      
      
      
      
      
      
      
      
      
    
  
  
    
      MMM
      3M Company
      Industrials
      190.17
      178.69
      2.49
      4.03
      2.46
      164.250000
      188.649994
      0.148554
    
    
      ABT
      Abbott Laboratories
      Health Care
      44.76
      41.60
      2.48
      1.65
      1.59
      39.049999
      42.669998
      0.092702
    
    
      ABBV
      AbbVie
      Health Care
      64.69
      62.20
      3.99
      4.90
      0.80
      56.450001
      64.129997
      0.136050
    
    
      ACN
      Accenture plc
      Information Technology
      121.49
      118.87
      2.11
      4.22
      2.08
      116.139999
      114.860001
      -0.011021
    
    
      ATVI
      Activision Blizzard
      Information Technology
      48.50
      42.49
      0.62
      2.58
      1.54
      42.599998
      48.419998
      0.136620



In [17]:

    
spdi['Sector'].value_counts()    # count how many companies in each sector









    Out[17]:





Consumer Discretionary         86
Information Technology         68
Industrials                    66
Financials                     65
Health Care                    60
Consumer Staples               37
Energy                         35
Real Estate                    30
Utilities                      28
Materials                      25
Telecommunications Services     5
Name: Sector, dtype: int64



In [18]:

    
sector = list(spdi['Sector'].unique())    # find all the sectors name by using the unique function and put into list
sector









    Out[18]:





['Industrials',
 'Health Care',
 'Information Technology',
 'Consumer Discretionary',
 'Utilities',
 'Financials',
 'Materials',
 'Consumer Staples',
 'Real Estate',
 'Energy',
 'Telecommunications Services']



In [19]:

    
dic = dict(spdi['Sector'].value_counts())           # use the dict function to make the value_counts a dictionary
sector_name = [key for key in dic.keys()]           # the key of the dictionary is name of each sector, put into a list
sector_count = [v for v in dic.values()]            # the value of the dictionary is numbers of companies in each sector, put in a list 

# Use the plotly to visualize the composition of S&P 500 stocks, use pie plot
fig = {
    'data': [{'labels': sector_name,
              'values': sector_count,
              'type': 'pie', 
              'hoverinfo':'label+percent+value',
             'hole' : .4
             }],
    'layout': {'title': 'S&P500 Decomposition by Sector Pie Chart',
              'annotations':[{
                'font': {'size': 10},
                'showarrow': False,
                'text': 'Sector + # of Companies + Percentage',
                'x': 0.5,
                'y': 0.5
            }]
     }
}
iplot(fig)

Comment:

The S&P 500 contains companies from 11 sectors. Companies are distributed in these 11 sectors. But some sectors like Telecom and Material only contains a relative small amount of companies.

Below is simple bar plot to see the holding period return of companies inside 'Industrial' Sectors



In [20]:

    
sns.set_style("whitegrid")                                # set the seaborn plotting style
# first try, simple bar plot example
spd1 = spdi[spdi['Sector']== sector[0]]                   # a small subset spd1 by slicing
fig, ax = plt.subplots(figsize=(8, 10))
spd1['Holding Period Return'].plot.barh(ax=ax)
ax.set_title('Bar Chart of Holding Period Return of Industrial Sector')
ax.set_xlabel('Holding Period Return')









    Out[20]:





<matplotlib.text.Text at 0x290b0e035f8>

Below is the plot procedure of average holding period return for each sector. We also add a horizontal line to show the holding period return on the S&P 500 index. By doing that, we could see which sector outperforms the aggregate index.

Note that the symbol for S&P500 index in Yahoo finance is '^GSPC'



In [21]:

    
sns.set()                                                            # reset the seaborn plotting style to default

# The following paragraph shows us how to get the holding period return of S&P 500 index. 
start = datetime.datetime(2016, 11, 1)
end = datetime.datetime(2017, 4, 15)
sp500 = web.DataReader('^GSPC', 'yahoo', start, end)                   # sp500 is a dataset used to calculate the return of the index
sp500_price = sp500['Close'].tolist()
sp500_return = sp500_price[-1] / sp500_price[0] -1
print('The Holding Period Return for S&P500 index is',sp500_return)

# The following codes shows how to plot
ave = spdi['Holding Period Return'].groupby(spdi['Sector']).mean()      # Use the groupby method to get the mean HPR
ave = ave.reset_index()
ave = ave.set_index('Sector')                                           # reset and then set the 'Sector' as index
# plot
fig, ax = plt.subplots(figsize = (10,6))
ave.plot(ax = ax, kind='bar',alpha = 0.3)
ax.axhline(sp500_return, linestyle = 'dashed', color = 'red')
ax.axhline(0,linestyle = '--', color ='black')
ax.set_title('Holding Period Return Sector Average', fontsize=15)
ax.text(0,sp500_return,'S&P500 Holding Period Return', fontsize=9)









    



The Holding Period Return for S&P500 index is 0.102868743481






    Out[21]:





<matplotlib.text.Text at 0x290b140fd68>

Comment:

From the plot, we could see that there are five sectors outperform the index(Financials, Health Care, Industrials, Information Technology, Materials) over the last half year.The only one sector that suffered from losses is the Telecom sector with least amount of companies.Consumer Staples and Energy Sectors also performs badly. So we might avoid bad sectors and pick good sectors in the following pitch.

Dividend Yield Pattern of Sectors

Here is graph of both boxplot and swarmplot of dividend yield of companies, classified by sectors in one canvas.

Before the plot, I have some comprehension of dividend yield. Dividend Yield, according to the definition, is a dividend expressed as a percentage of a current share price.

The companies with high profitability should have more disposable cash flow, so the dividend yield should be higher for them. Companies (such as tech companies and health care companies) which need a lot capital should be capital intensive and pay little dividend. But how big it is should also depend on company's operation strategy.

If a company pays a lot of dividend, of course shareholder would be happy because of the capital gain. It might attract a lot of new shareholders so the share price could go up. Return of the stock should also go up. But in the other hand, paying dividend unwisely could reduce company's disposable cash flow so the profitability should go down. This would cause negative effect on the stock price.

Let's see if we can capture the pattern of dividend yield of each sector.



In [22]:

    
# seaborn boxplot and swarmplot
fig,ax = plt.subplots(figsize=(18,18))
sns.boxplot(ax=ax,data=spdi, x='Sector', y='Dividend Yield')
sns.swarmplot(ax=ax,data=spdi, x='Sector', y='Dividend Yield',size =8, linewidth=1, alpha=0.5)
ax.set_ylim([0,10])
ax.set_title('Dividend Yield by Sector, Boxplot Plus Swarmplot', fontsize = 15)









    Out[22]:





<matplotlib.text.Text at 0x290b143a828>

Comment:

There are some notable facts in this graph.

Inside each sector, companies could have very different dividend yield even if they are in the same sector. Energy group is a good example for this point. The range of the dividend yield is wide.
There are some sectors which have high dividend yield in average. Utility and Real Estate sectors are two that have relatively high dividend yield. The only 5 companies in Telecom sector all pay super high dividend and they have really bad average holding period return. If we relate these with sector average holding period return above, the high dividend yield could be the reason why these sectors have underperformed the index and other sectors.
The sectors(Financials, Health Care, Industrials, Information Technology) which have outperformed the index all have relativlely low dividend yield in average. Take Health Care sector as an example. It has lowest average dividend yield(some health care companies don't pay dividend yield at all!). But it has highest average holding period return.

Data Analysis on all Statistics - Explore the correlation

I've already introduced the dividend yield. There are still some other statistics in the chart.

Definitions： PEG Ratio Short Ratio Moving Average

The Peg Ratio is a valuation metric for determining the relative trade-off between the price of a stock, the earnings generated per share (EPS), and the company's expected growth. It is derived by dividing the Price to earning ratio by the growth rate of earnings per share. High PEG ratio indicate that the company is overvalued too much compare to its growth, while a super low peg ratio might indicate that the company is undervalued. It could also be negative because the earnings and it's growth could be negative. A fair Peg raito should be around 1.
The short ratio is the ratio of tradable shares being shorted to shares in the market. Shorting stocks means that the investors is not optimistic about the stock. So I think a stock with relatively high short ratio should have lower return
Moving Average is a techinical analysis tool on the stock price. I used the simple moving average of stock price over last several days as the moving average. Usually, when the short term moving average is above the long term moving average, the stock should perform well.

In below, I add a column of 'sector average return' on the main dataset spdi. I also created a subset named sp_outperform by slicing the companies which outperform the sector average return. I use this dataset to do the data analysis, which gives me better and more visible correlation in below



In [23]:

    
# create subset of outperforming companies
spdi['Sector Average Return'] = spdi['Holding Period Return'].groupby(spdi['Sector']).transform('mean')
sp_outperform = spdi[spdi['Holding Period Return'] > spdi['Sector Average Return']]
sp_outperform.head()









    Out[23]:






  
    
      
      
      Sector
      50 Days Moving Average
      200 Days Moving Average
      Dividend Yield
      Short Ratio
      Peg Ratio
      Stock price on 2016-11-1
      Stock price on 2017-4-15
      Holding Period Return
      Sector Average Return
    
    
      Symbol
      Name
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      MMM
      3M Company
      Industrials
      190.17
      178.69
      2.49
      4.03
      2.46
      164.250000
      188.649994
      0.148554
      0.138182
    
    
      ATVI
      Activision Blizzard
      Information Technology
      48.50
      42.49
      0.62
      2.58
      1.54
      42.599998
      48.419998
      0.136620
      0.120163
    
    
      ADBE
      Adobe Systems Inc
      Information Technology
      124.94
      112.64
      0.00
      2.25
      2.03
      106.870003
      129.050003
      0.207542
      0.120163
    
    
      AET
      Aetna Inc
      Health Care
      129.54
      122.75
      1.55
      4.42
      1.09
      105.820000
      128.630005
      0.215555
      0.157534
    
    
      AMG
      Affiliated Managers Group Inc
      Financials
      163.76
      151.95
      0.49
      2.48
      0.78
      134.339996
      161.279999
      0.200536
      0.146524



In [24]:

    
#Use the seaborn pairplot to give a overview of the statistics and the holding period return
sns.set_style("whitegrid")
sp_outperform = sp_outperform.dropna(axis=0, how='any')                            # remove some stocks which has nan value in statistics
sns.pairplot(sp_outperform[['Dividend Yield', 'Short Ratio', 'Peg Ratio','Holding Period Return']], size = 3)









    Out[24]:





<seaborn.axisgrid.PairGrid at 0x290b12103c8>

Comment:

The pairplot was not so satisfied, but we get some conclusion.

Peg ratio might be a bad statistics for analyzing stock. In the pair plot, because of some really deviated Peg ratio. the pattern of the PEG ratio is like a straight line which tells nothing. I'll zoom in the detail of the plot in below
The short ratio has a weak positive correlation with dividend yield. This could be reasonable because as companies pay high dividend, investors would expect the stock price to go down because it hurts disposable cash flow and profitability of companies. So investors would short more stocks of the company.
Short ratio also has a weak negative correlation with holding period return. As short ratio gets super high(>8%), the average holding period return is lower than the one of companies with lower short ratio.
Dividend Yield seems to be uncorrelated with Holding period return. But we observe some super high return in companies with low dividend yield. This might suggest a weak negative correlation.



In [25]:

    
# Use the regplot to zoom in the graphs, I also did a linear regression with confidence interval 
fig, ax = plt.subplots(2,2,figsize = (12,12))
sns.regplot(x='Dividend Yield',y='Holding Period Return',data=sp_outperform,ax=ax[0,0], ci=95)
sns.regplot(x='Short Ratio',y='Holding Period Return',data=sp_outperform,ax=ax[0,1], ci=95)
sns.regplot(x='Peg Ratio',y='Holding Period Return',data=sp_outperform,ax=ax[1,0],ci=95)
ax[1,0].set_ylim([-1,1])
ax[1,0].set_xlim([-5,5])
sns.regplot(x='Dividend Yield',y='Short Ratio',data=sp_outperform,ax=ax[1,1], ci=95)









    Out[25]:





<matplotlib.axes._subplots.AxesSubplot at 0x290b40458d0>

These four plot proves my thoughts in above. The Peg Ratio is not a good measure.And there is a negative correlation between dividend yield and return. All correlations are not strong so it is not satisfying.

Below I did a linear regression to model the relationship between the dividend yield and holding period return. Just to show the negative correlation coefficient.



In [26]:

    
# Load sklearn.linear_model to fit the linear regression and get the coefficients
from sklearn.linear_model import LinearRegression
splm = sp_outperform[['Holding Period Return','Dividend Yield' ]]
npmatrix = np.matrix(splm)
Y, X = npmatrix[:,0], npmatrix[:,1]
lreg = LinearRegression().fit(X,Y)
m = lreg.coef_[0]
c = lreg.intercept_
print('The formula of Model is: Holding Period Return = {0}*Dividend Yield + {1}'.format(m,c))









    



The formula of Model is: Holding Period Return = [-0.02793481]*Dividend Yield + [ 0.24026721]

Exploring Moving Average

Here, to identify whether the 50 days moving average is higher than 200days moving average in April 15,2017, I create a subset of spdi called spdi_MA. I create a new columns which contains booleans to signal the moving average relationship. True means 50 days moving average is higher than 200 days moving average. False means the opposite.



In [27]:

    
spdi_MA = spdi[['Sector','50 Days Moving Average', '200 Days Moving Average', 'Holding Period Return']].copy()    #copy the subset of spdi
spdi_MA['MA Signal'] = (spdi['50 Days Moving Average'] > spdi['200 Days Moving Average'])            # boolean selection as new column
spdi_MA.head()









    Out[27]:






  
    
      
      
      Sector
      50 Days Moving Average
      200 Days Moving Average
      Holding Period Return
      MA Signal
    
    
      Symbol
      Name
      
      
      
      
      
    
  
  
    
      MMM
      3M Company
      Industrials
      190.17
      178.69
      0.148554
      True
    
    
      ABT
      Abbott Laboratories
      Health Care
      44.76
      41.60
      0.092702
      True
    
    
      ABBV
      AbbVie
      Health Care
      64.69
      62.20
      0.136050
      True
    
    
      ACN
      Accenture plc
      Information Technology
      121.49
      118.87
      -0.011021
      True
    
    
      ATVI
      Activision Blizzard
      Information Technology
      48.50
      42.49
      0.136620
      True



In [28]:

    
# Use the violinplot to plot the holding period return for each sectors. 
# Note that I use the hue function on 'MA signal'. so we get two pattern for stocks in each sectors.
sns.set(style="whitegrid", palette="pastel", color_codes=True)
fig, ax = plt.subplots(figsize=(17,16))
sns.violinplot(ax=ax, x= 'Sector', y ='Holding Period Return',hue='MA Signal', data = spdi_MA, edgecolor='gray', inner='quart', split=True)
ax.set_ylim([-1,1])
ax.set_title('Holding Period Return vs. Moving Average 50-200 Order', fontsize = 18)

fig.tight_layout()

Comment:

Interestingly, whatever the sectors is, stocks with 50days moving average higher than 200days moving average always have higher average holding period return. So this is a good benchmark when picking the stocks.

Final thing I did in this part is to combine all ideas above to see if I could get a good portfolio of stocks which performed well in last half years. According to the correlation we should pick stocks with lower dividend yield, lower short ratio and stocks with 50days moving average higher than 200days moving average.

I do this by using the sort_value function so it could give me the stocks with lowest dividend yield as well as short ratio. I also use a boolean selection on spdi dataset to find stocks with 50days MA> 200days MA.



In [29]:

    
# I selected the top 10 stocks.
sp_top10 = spdi[spdi['50 Days Moving Average'] > spdi['200 Days Moving Average']].sort_values(['Dividend Yield', 'Short Ratio'],
                                                                                              ascending = True,
                                                                                              na_position='last').head(10)
sp_top10









    Out[29]:






  
    
      
      
      Sector
      50 Days Moving Average
      200 Days Moving Average
      Dividend Yield
      Short Ratio
      Peg Ratio
      Stock price on 2016-11-1
      Stock price on 2017-4-15
      Holding Period Return
      Sector Average Return
    
    
      Symbol
      Name
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      ETFC
      E*Trade
      Financials
      34.86
      33.70
      0.0
      1.19
      1.68
      28.100000
      33.220001
      0.182206
      0.146524
    
    
      MU
      Micron Technology
      Information Technology
      26.46
      21.85
      0.0
      1.83
      6.25
      16.980000
      26.740000
      0.574794
      0.120163
    
    
      ULTA
      Ulta Salon Cosmetics & Fragrance Inc
      Consumer Discretionary
      281.43
      262.30
      0.0
      1.87
      1.70
      240.179993
      282.679993
      0.176951
      0.057711
    
    
      ADBE
      Adobe Systems Inc
      Information Technology
      124.94
      112.64
      0.0
      2.25
      2.03
      106.870003
      129.050003
      0.207542
      0.120163
    
    
      ADSK
      Autodesk Inc
      Information Technology
      85.69
      78.79
      0.0
      3.60
      -5.92
      70.099998
      84.910004
      0.211270
      0.120163
    
    
      MYL
      Mylan N.V.
      Health Care
      41.49
      38.76
      0.0
      3.67
      0.74
      37.160000
      37.669998
      0.013724
      0.157534
    
    
      LH
      Laboratory Corp. of America Holding
      Health Care
      142.97
      134.29
      0.0
      3.75
      1.65
      123.830002
      142.100006
      0.147541
      0.157534
    
    
      UAL
      United Continental Holdings
      Industrials
      70.70
      67.50
      0.0
      3.79
      -1.54
      56.000000
      69.070000
      0.233393
      0.138182
    
    
      WAT
      Waters Corporation
      Health Care
      155.67
      147.07
      0.0
      4.53
      2.64
      137.429993
      155.690002
      0.132868
      0.157534
    
    
      RIG
      Transocean
      Energy
      12.70
      12.66
      0.0
      5.03
      -4.04
      9.590000
      11.720000
      0.222106
      0.027429



In [30]:

    
#plot their performance and their sector average performance
fig, ax = plt.subplots(figsize=(8, 8))
sp_top10.plot.barh(ax=ax, y="Holding Period Return", color="Blue")
sp_top10.plot.barh(ax=ax, y="Sector Average Return", color="Pink", alpha=0.6)
ax.vlines(0,-10,10, lw=3)
ax.set_title('Top10 S&P500 Stocks Selected, Holding Period Return vs. Sector Average HPR')
ax.set_xlabel('Returns')









    Out[30]:





<matplotlib.text.Text at 0x290b4cf1668>

Conclusion: It seems the portfolio is good. All stocks gain a positive return. 7 out of 10 stocks outperform their own sector. 8 out of 10 are stocks in hot sectors (Financials, Health Care, Industrials, Information Technology, Materials). I also get the stock with super return (57.5%) named Micron Technology. But the portfolio is also not perfect as the portfolio contains underperformer like Mylan N.V.

Trading Idea simulation

Below we examine two simple dynamic trading strategy.

The first trading strategy is a brand-new idea, aiming to explore the relationship between the daily trading volume change and the daily return of the stocks. Before I do the simulation, I think the price of the stock should increase with an increasing trading volume in a up market because there are more demand. It turned out to be a failed attempt.

The second trading strategy is inspired by the idea above. I could long the stocks when the short-term moving average is higher than the long-term moving average and get the daily return. When opposite situation, I don't hold any position.(I don't consider shorting here because shorting is costly) Also, if we consider the closing price as a moving average of super short period, we could also use the relation between closing price and moving average as a decision factor.

-First Trading Strategy - Volume Change based - Failed Attempt

Start by examine the relationship of daily volume change and daily return



In [31]:

    
# Get another timeseries data of stock, I used Apple stocks as an example('AAPL').
start = datetime.datetime(2016, 11, 1)
end = datetime.datetime(2017, 4, 15)
stock1 = web.DataReader('AAPL', 'yahoo', start, end)[['Close', 'Volume']]         #I only need Closing price and volume data
stock1.head()



In [32]:

    
stock1['Previous Close'] = stock1.Close.shift(1)                        # Use the shift(1) method to shift 1 space of the volume and price
stock1['Previous Volume'] = stock1.Volume.shift(1)
stock1['Volume Change'] = np.log(stock1['Volume']/ stock1['Previous Volume'])  # Calculate the daily return and the daily volume change
stock1['Daily Return'] = np.log(stock1['Close']/ stock1['Previous Close'])     
# I express the daily return and volume change as log return, which is more convenient when I sum the return 
stock1.head()









    Out[32]:






  
    
      
      Close
      Volume
      Previous Close
      Previous Volume
      Volume Change
      Daily Return
    
    
      Date
      
      
      
      
      
      
    
  
  
    
      2016-11-01
      111.489998
      43825800
      NaN
      NaN
      NaN
      NaN
    
    
      2016-11-02
      111.589996
      28331700
      111.489998
      43825800.0
      -0.436241
      0.000897
    
    
      2016-11-03
      109.830002
      26932600
      111.589996
      28331700.0
      -0.050644
      -0.015898
    
    
      2016-11-04
      108.839996
      30837000
      109.830002
      26932600.0
      0.135378
      -0.009055
    
    
      2016-11-07
      110.410004
      32560000
      108.839996
      30837000.0
      0.054369
      0.014322



In [33]:

    
# The scatter plot of volume change vs. Daily return, using plotly.
trace = go.Scatter(x = stock1['Volume Change'],
                   y = stock1['Daily Return'],
                   mode = 'markers'
                  )
layout = dict( title = 'Daily Volume Change vs. Price Change(Daily Return)',
              height = 600,
              width = 800,
             yaxis = dict(zeroline = True,
                         title = 'Daily Return'),
             xaxis = dict(zeroline=True,
                         title = 'Daily Volume Change')
             )
iplot(go.Figure(data=[trace], layout=layout))

The graph looks bad. I didn't discover any obvious correlations between the volume change and the return of stocks.

Below we do a statistical linear regression to see whether the two are correlated.



In [34]:

    
res = ols(y=stock1['Daily Return'], x=stock1['Volume Change'] )
res









    



D:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2847: FutureWarning:

The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html







    Out[34]:





-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         112
Number of Degrees of Freedom:   2

R-squared:         0.0177
Adj R-squared:     0.0088

Rmse:              0.0097

F-stat (1, 110):     1.9807, p-value:     0.1621

Degrees of Freedom: model 1, resid 110

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     0.0036     0.0025       1.41     0.1621    -0.0014     0.0085
     intercept     0.0021     0.0009       2.33     0.0215     0.0003     0.0039
---------------------------------End of Summary---------------------------------

The t-stat of coefficient and R-squared are not big enough. Though we get a positive correlaton， it is a weak and unreliable.

Still, I still want to examine the return of strategy, that is, the return that one long the stock when volume increase and don't hold any position when volume decrease. Below are the test of the example stock.



In [35]:

    
print('The strategy return is' ,stock1[stock1['Volume Change']>0]['Daily Return'].sum())
print('The market return is' , stock1['Daily Return'].sum())        #









    



The strategy return is 0.19610995750258634
The market return is 0.235179575876408

I get a smaller return than the market return. It means the strategy failed on the sample. Below, I use the first 20 stocks as sample to test again my first strategy.



In [36]:

    
# define functions which could get the strategy return and the market return of a stock by inputting the symbol
# The function basically use the same procedure I did for Apple stocks.
def strategy1_return(stock):
    start = datetime.datetime(2016, 11, 1)
    end = datetime.datetime(2017, 4, 15)
    stock1 = web.DataReader(stock, 'yahoo', start, end)
    stock1 = stock1[['Close', 'Volume']]
    stock1['Previous Close'] = stock1.Close.shift(1)
    stock1['Previous Volume'] = stock1.Volume.shift(1)
    stock1['Volume Change'] = np.log(stock1['Volume']/ stock1['Previous Volume'])
    stock1['Daily Return'] = np.log(stock1['Close']/ stock1['Previous Close'])
    return stock1[stock1['Volume Change']>0]['Daily Return'].sum()

def market_return(stock):
    start = datetime.datetime(2016, 11, 1)
    end = datetime.datetime(2017, 4, 15)
    stock1 = web.DataReader(stock, 'yahoo', start, end)
    stock1 = stock1[['Close', 'Volume']]
    stock1['Previous Close'] = stock1.Close.shift(1)
    stock1['Previous Volume'] = stock1.Volume.shift(1)
    stock1['Volume Change'] = np.log(stock1['Volume']/ stock1['Previous Volume'])
    stock1['Daily Return'] = np.log(stock1['Close']/ stock1['Previous Close'])
    return stock1['Daily Return'].sum()
# to insure that my functions are right, print out the returns for apple stocks.
print('The strategy return is', strategy1_return('AAPL'))
print('The market return is', market_return('AAPL'))









    



The strategy return is 0.19610995750258634
The market return is 0.235179575876408



In [37]:

    
# get the symbol_list from the spdi dataset
symbol_list = list(spdi.reset_index()['Symbol'].astype(str))
# build two list to contains returns for strategy return and market return. It might takes a while to finish running since there are 20 stocks.
lists1 = []
listm1 = []
for symbol in symbol_list[0:20]:
    lists1.append(strategy1_return(symbol))                        # for each stock symbol get the strategy return and add it into the list
    listm1.append(market_return(symbol))                        # for each stock symbol get the market return and add it into the list



In [38]:

    
# create new data frame for this strategy in order to plot
strategy1 = pd.DataFrame({'Strategy Return': lists1,
                         'Market Return': listm1},
                        index = symbol_list[0:20])
strategy1.head(3)









    Out[38]:






  
    
      
      Market Return
      Strategy Return
    
  
  
    
      MMM
      0.138504
      0.069809
    
    
      ABT
      0.088653
      -0.079367
    
    
      ABBV
      0.127557
      0.063072



In [39]:

    
# use plotly to plot the bar chart of market return vs. strategy return
mr = dict(type = 'bar',
          name = 'Market Return',
          y = strategy1['Market Return'],
          x = strategy1.index,
          marker = dict(color = 'rgb(200,200,200)'),
          opacity=0.4
        )
sr = dict(type = 'bar',
          name = 'Strategy Return',
          y = strategy1['Strategy Return'],
          x = strategy1.index,
          marker = dict(color = 'rgb(100,10,100)'),
          opacity = 0.4
         )
layout = dict(width=850, height=500,                        
              yaxis={'title': 'Return'},                    
              title='Market Return vs. Strategy Return',            
              xaxis={'title': 'Sample Company Symbol'}
         )
iplot(go.Figure(data=[mr,sr], layout=layout))

Comment:

Obviously , the first strategy failed because it can not insure us to get better return than the market. There are lots of stocks in the chart underperforming the market.

-Second Trading Strategy - Moving Average Based - Success

To restate the idea, the trader long the stocks when the short term moving average is higher than the long term moving average. Specifically, I long the stock when the stock price (super short term moving average) is greater than 5 days moving average, and the 5 days moving average is greater than the 20days moving average. I pick the 5 days and 20days simply because our testing period is only half years. One can test the 50days and 200days for longer period.



In [40]:

    
# Get another timeseries data of stock, still used Apple stocks as an example('AAPL').
start = datetime.datetime(2016, 11, 1)
end = datetime.datetime(2017, 4, 15)
stock2 = web.DataReader('AAPL', 'yahoo', start, end)[['Close']]         #This time I only need  closing price  
stock2.head()



In [41]:

    
# Calculate the 5/20 days moving average using the rolling method. 
stock2['5 Days Moving Average'] = stock2['Close'].rolling(window=5).mean()         
stock2['20 Days Moving Average'] = stock2['Close'].rolling(window=20).mean()
stock2['Previous Close'] = stock2.Close.shift(1)
stock2['Daily Return'] = np.log(stock2['Close']/ stock2['Previous Close'])      #Similar to the first strategy, calculate daily log return
stock3 = stock2.dropna(axis = 0, how='any') #drop several nan data in the beginning caused by rolling function, create a new dataset stock3
stock3.head()                         # The new timeseries data begin on November 29,2016









    Out[41]:






  
    
      
      Close
      5 Days Moving Average
      20 Days Moving Average
      Previous Close
      Daily Return
    
    
      Date
      
      
      
      
      
    
  
  
    
      2016-11-29
      111.459999
      111.570001
      110.1360
      111.570000
      -0.000986
    
    
      2016-11-30
      110.519997
      111.314000
      110.0875
      111.459999
      -0.008469
    
    
      2016-12-01
      109.489998
      110.965999
      109.9825
      110.519997
      -0.009363
    
    
      2016-12-02
      109.900002
      110.587999
      109.9860
      109.489998
      0.003738
    
    
      2016-12-05
      109.110001
      110.095999
      109.9995
      109.900002
      -0.007214



In [42]:

    
# time series plot of closing price and 5/20 days moving average, plotly
close = go.Scatter(
                x=stock3.index,
                y=stock3['Close'] ,
                name = 'Closing Price',
                line = dict(color = 'black'),
                opacity = 0.8)
MA5 = go.Scatter(
                x=stock3.index,
                y=stock3['5 Days Moving Average'] ,
                name = '5 Days Moving Average',
                line = dict(color = 'rgb(0,125,255)'),
                opacity = 0.8)

MA20 = go.Scatter(
                x=stock3.index,
                y=stock3['20 Days Moving Average'],
                name = '20 Days Moving Average',
                line = dict(color = 'rgb(255,0,0)'),
                opacity = 0.8)


layout = dict(
    title = 'Time Series Plot of Stock Price and 5/20 Days Moving Average',
    yaxis={'title': 'Price'},
    xaxis={'title': 'Date'}
)

iplot(dict(data=[close,MA5, MA20], layout=layout))



In [43]:

    
# Here we define our benchmark for longing the stock, and similarly we calculate our new strategy return and market return
benchmark = (stock3['5 Days Moving Average'] > stock3['20 Days Moving Average']) | (stock3['Close']> stock3['5 Days Moving Average'])
print('The strategy return is', stock3[benchmark]['Daily Return'].sum())
print('The market return is', stock3['Daily Return'].sum())









    



The strategy return is 0.23976543864949776
The market return is 0.23446226211197094



In [44]:

    
# similarly, we define another two function for the second strategy
def strategy2_return(stock):
    start = datetime.datetime(2016, 11, 1)
    end = datetime.datetime(2017, 4, 15)
    stock2 = web.DataReader(stock, 'yahoo', start, end)[['Close']]
    stock2['5 Days Moving Average'] = stock2['Close'].rolling(window=5).mean()
    stock2['20 Days Moving Average'] = stock2['Close'].rolling(window=20).mean()
    stock2['Previous Close'] = stock2.Close.shift(1)
    stock2['Daily Return'] = np.log(stock2['Close']/ stock2['Previous Close'])
    stock3 = stock2.dropna(axis = 0, how='any')
    benchmark = (stock3['5 Days Moving Average'] > stock3['20 Days Moving Average']) | (stock3['Close']> stock3['5 Days Moving Average'])
    return stock3[benchmark]['Daily Return'].sum()

def market2_return(stock):
    start = datetime.datetime(2016, 11, 1)
    end = datetime.datetime(2017, 4, 15)
    stock2 = web.DataReader(stock, 'yahoo', start, end)[['Close']]
    stock2['5 Days Moving Average'] = stock2['Close'].rolling(window=5).mean()
    stock2['20 Days Moving Average'] = stock2['Close'].rolling(window=20).mean()
    stock2['Previous Close'] = stock2.Close.shift(1)
    stock2['Daily Return'] = np.log(stock2['Close']/ stock2['Previous Close'])
    stock3 = stock2.dropna(axis = 0, how='any')
    benchmark = (stock3['5 Days Moving Average'] > stock3['20 Days Moving Average']) | (stock3['Close']> stock3['5 Days Moving Average'])
    return stock3['Daily Return'].sum()
# to insure that my functions are right, print out the returns for apple stocks.
print('The strategy return is', strategy2_return('AAPL'))
print('The market return is', market2_return('AAPL'))









    



The strategy return is 0.23976543864949776
The market return is 0.23446226211197094

Steps below are quite similar to the first strategy, I use the symbol_list I derived in the first trading strategy simulation.



In [45]:

    
lists2=[]
listm2=[]
# build two list to contains returns for strategy return and market return. It might takes a while to finish running since there are 20 stocks.
for symbol in symbol_list[0:20]:
    lists2.append(strategy2_return(symbol))
    listm2.append(market2_return(symbol))



In [46]:

    
# create new data frame for this strategy in order to plot
strategy2 = pd.DataFrame({'Strategy Return': lists2,
                         'Market Return': listm2},
                        index = symbol_list[0:20])
strategy2.head(3)









    Out[46]:






  
    
      
      Market Return
      Strategy Return
    
  
  
    
      MMM
      0.089496
      0.135675
    
    
      ABT
      0.096623
      0.186077
    
    
      ABBV
      0.075608
      0.193142



In [47]:

    
mr2 = dict(type = 'bar',
          name = 'Market Return',
          y = strategy2['Market Return'],
          x = strategy2.index,
          marker = dict(color = 'rgb(200,200,200)'),
          opacity=0.4
        )
sr2 = dict(type = 'bar',
          name = 'Moving Average Strategy Return',
          y = strategy2['Strategy Return'],
          x = strategy2.index,
          marker = dict(color = 'rgb(100,10,100)'),
          opacity = 0.4
         )
layout = dict(width=850, height=500,                        
              yaxis={'title': 'Return'},                    
              title='Market Return vs. Moving Average Strategy Return',            
              xaxis={'title': 'Sample Company Symbol'}
         )
iplot(go.Figure(data=[mr2,sr2], layout=layout))

Comment:

We get a great plot! All the stocks we used in the graph get possitive returns by using our second strategy based on moving average. Also, they outperform the market return. Some (such as AYI, AAP) even get a positive return when the market return is negative.

In reality, there are some difficulties to use this strategy. One problem is, there is always a lag between the time you discover the pattern of the moving average (stock price> 5days MA> 20 days MA) and the time when you build your position. So in real life the return would be lower than the return I get above.

Summary

All in all, the exploration is meaningful. Though there are some failures like PEG ratio and volume based trading strategy. I get some valuable discovery in this notebook. For a trader or investor, it might be better to pick stocks in hot sectors. One should be aware of the meaningful correlations between the short ratio and return, the dividend yield and return. One could also take advantage of the moving average order, as I showed both in the analysis part and trading simulation part.

Thanks for reading!



In [ ]:

	Symbol	Name	Sector
0	MMM	3M Company	Industrials
1	ABT	Abbott Laboratories	Health Care
2	ABBV	AbbVie	Health Care
3	ACN	Accenture plc	Information Technology
4	ATVI	Activision Blizzard	Information Technology
5	AYI	Acuity Brands Inc	Industrials
6	ADBE	Adobe Systems Inc	Information Technology
7	AAP	Advance Auto Parts	Consumer Discretionary

	Symbol	Name	Sector	50 Days Moving Average	200 Days Moving Average	Dividend Yield	Short Ratio	Peg Ratio	Stock price on 2016-11-1	Stock price on 2017-4-15	Holding Period Return
0	MMM	3M Company	Industrials	190.17	178.69	2.49	4.03	2.46	164.250000	188.649994	0.148554
1	ABT	Abbott Laboratories	Health Care	44.76	41.60	2.48	1.65	1.59	39.049999	42.669998	0.092702
2	ABBV	AbbVie	Health Care	64.69	62.20	3.99	4.90	0.80	56.450001	64.129997	0.136050
3	ACN	Accenture plc	Information Technology	121.49	118.87	2.11	4.22	2.08	116.139999	114.860001	-0.011021
4	ATVI	Activision Blizzard	Information Technology	48.50	42.49	0.62	2.58	1.54	42.599998	48.419998	0.136620

		Sector	50 Days Moving Average	200 Days Moving Average	Dividend Yield	Short Ratio	Peg Ratio	Stock price on 2016-11-1	Stock price on 2017-4-15	Holding Period Return
Symbol	Name
MMM	3M Company	Industrials	190.17	178.69	2.49	4.03	2.46	164.250000	188.649994	0.148554
ABT	Abbott Laboratories	Health Care	44.76	41.60	2.48	1.65	1.59	39.049999	42.669998	0.092702
ABBV	AbbVie	Health Care	64.69	62.20	3.99	4.90	0.80	56.450001	64.129997	0.136050
ACN	Accenture plc	Information Technology	121.49	118.87	2.11	4.22	2.08	116.139999	114.860001	-0.011021
ATVI	Activision Blizzard	Information Technology	48.50	42.49	0.62	2.58	1.54	42.599998	48.419998	0.136620

		Sector	50 Days Moving Average	200 Days Moving Average	Dividend Yield	Short Ratio	Peg Ratio	Stock price on 2016-11-1	Stock price on 2017-4-15	Holding Period Return	Sector Average Return
Symbol	Name
ETFC	E*Trade	Financials	34.86	33.70	0.0	1.19	1.68	28.100000	33.220001	0.182206	0.146524
MU	Micron Technology	Information Technology	26.46	21.85	0.0	1.83	6.25	16.980000	26.740000	0.574794	0.120163
ULTA	Ulta Salon Cosmetics & Fragrance Inc	Consumer Discretionary	281.43	262.30	0.0	1.87	1.70	240.179993	282.679993	0.176951	0.057711
ADBE	Adobe Systems Inc	Information Technology	124.94	112.64	0.0	2.25	2.03	106.870003	129.050003	0.207542	0.120163
ADSK	Autodesk Inc	Information Technology	85.69	78.79	0.0	3.60	-5.92	70.099998	84.910004	0.211270	0.120163
MYL	Mylan N.V.	Health Care	41.49	38.76	0.0	3.67	0.74	37.160000	37.669998	0.013724	0.157534
LH	Laboratory Corp. of America Holding	Health Care	142.97	134.29	0.0	3.75	1.65	123.830002	142.100006	0.147541	0.157534
UAL	United Continental Holdings	Industrials	70.70	67.50	0.0	3.79	-1.54	56.000000	69.070000	0.233393	0.138182
WAT	Waters Corporation	Health Care	155.67	147.07	0.0	4.53	2.64	137.429993	155.690002	0.132868	0.157534
RIG	Transocean	Energy	12.70	12.66	0.0	5.03	-4.04	9.590000	11.720000	0.222106	0.027429

	Close	Volume
Date
2016-11-01	111.489998	43825800
2016-11-02	111.589996	28331700
2016-11-03	109.830002	26932600
2016-11-04	108.839996	30837000
2016-11-07	110.410004	32560000

	Close	5 Days Moving Average	20 Days Moving Average	Previous Close	Daily Return
Date
2016-11-29	111.459999	111.570001	110.1360	111.570000	-0.000986
2016-11-30	110.519997	111.314000	110.0875	111.459999	-0.008469
2016-12-01	109.489998	110.965999	109.9825	110.519997	-0.009363
2016-12-02	109.900002	110.587999	109.9860	109.489998	0.003738
2016-12-05	109.110001	110.095999	109.9995	109.900002	-0.007214